Enriching SCFG rules directly from efficient bilingual chart parsing

نویسندگان

  • Martin Cmejrek
  • Bowen Zhou
  • Bing Xiang
چکیده

In this paper, we propose a new method for training translation rules for a Synchronous Context-free Grammar. A bilingual chart parser is used to generate the parse forest, and EM algorithm to estimate expected counts for each rule of the ruleset. Additional rules are constructed as combinations of reliable rules occurring in the parse forest. The new method of proposing additional translation rules is independent of word alignments. We present the theoretical background for this method, and initial experimental results on German-English translations of Europarl data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two Methods for Extending Hierarchical Rules from the Bilingual Chart Parsing

This paper studies two methods for training hierarchical MT rules independently of word alignments. Bilingual chart parsing and EM algorithm are used to train bitext correspondences. The first method, rule arithmetic, constructs new rules as combinations of existing and reliable rules used in the bilingual chart, significantly improving the translation accuracy on the German-English and Farsi-E...

متن کامل

Chart Parsing and Constraint Programming

In this paper, parsing-as-deduction and constraint programming are brought together to outline a procedure for the specification of constraint-based chart parsers. Following the proposal in Shieber et al. (1995), we show how to directly realize the inference rules for deductive parsers as Constraint Handling Rules (Frühwirth, 1998) by viewing the items of a chart parser as constraints and the c...

متن کامل

Acquiring a Stochastic Context-Free Grammar from the Penn Treebank

In this paper we present preliminary results of investigating the structure of the Penn Treebank and how these results can be used in probabilistic parsing of English. Penn Treebank is a corpus of 4.9 million part-of-speech (POS) tagged words and 2.9 million words of skeletally parsed data developed by the University of Pennsylvania (see 8]). By matching skeletal parse les with POS-tagged les w...

متن کامل

An Extended GHKM Algorithm for Inducing λ-SCFG

Semantic parsing, which aims at mapping a natural language (NL) sentence into its formal meaning representation (e.g., logical form), has received increasing attention in recent years. While synchronous context-free grammar (SCFG) augmented with lambda calculus (λSCFG) provides an effective mechanism for semantic parsing, how to learn such λ-SCFG rules still remains a challenge because of the d...

متن کامل

Bilingual Markov Reordering Labels for Hierarchical SMT

Earlier work on labeling Hiero grammars with monolingual syntax reports improved performance, suggesting that such labeling may impact phrase reordering as well as lexical selection. In this paper we explore the idea of inducing bilingual labels for Hiero grammars without using any additional resources other than original Hiero itself does. Our bilingual labels aim at capturing salient patterns...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009